This assignment is for ETC5521 Assignment 1 by Team Quokka comprising of Dea Avega Editya, Siyi Li, Vinny Vu, and Aryan Jain.
Juneteenth is a significant event for African Americans which occurred on June 19, 1865 where Union Gen. Gordon Granger issued an order officially freeing the enslaved African Americans (Lockhart (2018).) Although this event occurred over 100 years ago issues still exist around the mistreatment of African Americans. Today we will still see protest and pleas around the violence and mistreatment experienced by the black community and #BlackLivesMatter movement surrounding the death of George Floyd ((“Black Lives Matter” 2020).) Therefore, we are motivated to explore history of slavery in the United States of America (USA) and its significance in Black American history. This analysis will be conducted through the data sourced from tidytuesday github (https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-06-16/readme.md) (Rfordatascience (2020).)
This report has several limitations:
1. Data sets used only contain relatively small observations, i.e. slaves_name record only covers numbers of slaves saved during their expedition. Hence, it may not really capture situation during slavery history.
2. Census data only capture US demographics from 1790 to 1870 which is quite short regarding long existence of slavery (prior to the census period). In addition, West region only has census data of year 1850 and 1860.
3. Some proportion of data has N/A value and errors which would be omitted during data exploration.
Our data sets are retrieved from github repository of tidytuesday project, which has original source from US Census’s Archives, Slave Voyages, and Black Past Rfordatascience (2020).
There are four data sets in the tidytuesday’s repo which include:
Note, the gender category does not have consistent categorization and is cleaned in section 3.1
The wrangling process is conducted by grouping some variables in the african_names to have aggregate numbers for each category, hence enable us to compare across categories. We also recalculate the total population in the census data set since the existing total is incorrect for the West region (we found this miscalculation after visualizing the data). These proportions will be used to track slavery exploitation in the USA.
Furthermore, this report also takes advantage of the comprehensive record of African-Americans’ historical events in the blackpast data set, to analyze which region of USA are unfriendly to the African-American people relating to unfortunate events recorded.
Therefore, using all of these mentioned data sets this report will find a brief explanation on a main question: What can we learn from the history of slavery in the USA and its prominence in today’s issues surrounding racism towards African-Americans?
To answer the main question, we will first look at these secondary questions: 1. What is the demographic of black slaves? 2. Which region of the USA that had most exploited the practice? 3. Which region is unfriendly to black people? 4. How does the exploitation of African slaves compare across time during the slavery period? 5. Which subjects were more relevant in the slavery and post slavery era 6. Which states were more active in spreading awareness after the abolition of slavery?
References of data sets sources: 1. Tidytuesday (https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-06-16/readme.md) (Rfordatascience (2020)), 2. Blackpast (https://www.blackpast.org/african-american-history-timeline/) (BlackPast (2020)) 3. US census data (https://www.census.gov/content/dam/Census/library/working-papers/2002/demo/POP-twps0056.pdf)
In this section, we will extract some information from african_names dataset related to demographic of slaves. By extracting and analyzing the data, we can have an initial picture of the enslavement practice before moving on to further analysis in the following sections.
We group the category of slaves by id (since names are not always unique), gender and age and plot gender category on a bar chart in figure 4.1. The plot shows us that men occupy the largest percentage of total slaves in the observed data, followed by boys and subsequently women. Meanwhile, girls contribute the smallest to the total population of slaves.
Figure 4.1: Composition Based on Gender
In general, we can see that the male proportion is significantly larger than female. However, 4.2 also reveals that children make up for almost one-third of the total slave population.
Figure 4.2: Maturity of Slaves
Looking at the age distribution of slaves as shown in figure 4.3, we can observe that the average age of the adult slaves was between 23 to 26 with majority being younger than 35 years of age. This range of productive ages is not surprising since they were brought into the country mainly for manual labor, according to prior information we got from blackpast.org.
Figure 4.3: Age Distribution
Although, there are some outliers such as a 77 years old man and a 70 years old women which is definitely not ideal age for manual labor. We could also observe that some slaves are as young as 5 month of age which could be due to the fact that they were brought along with their parents as slaves.
Figure 4.4: Composition of White and Black people in the US by region with the census year on the x-axis and proportion of people on the y-axis. From the plot we can see the South region stands out having the largest proportion of slaves across all regions.
To answer the question of which region that has most significantly exploited the slavery practice, we will not look at the number of black slaves in each region of the USA. Instead, we will be exploring the trend of the proportion from these three categories (white, black slaves, and black free), as seen in figure 4.4. In the figure, we are not covering the census in 1870 since enslavement practices ended in that year.
From the plot, we note that the South region has the most contrasting pattern distinguishing the region from the others. The proportion of white people in the South region slightly decrease from 1800 until 1860, however, the proportion of black slaves is growing and comprises reaching more than a quarter of the total population in that region.
Figure 4.5: Proportion of White and Black by the divisions of the South region in to US with census year on the x-axis and proportion of people on the y-axis. From the plot we can the proportion remains fairly stable across the South Atlantic region, the East South Central region experiences an increasing proportion of black slaves across the period whereas the West South Central region experiences the opposite trend.
After spotting the pattern, we are interested to explore further into the South region to see the slavery practice in division level. The South region consists of three divisions, South Atlantic, East South Central and West South Central. From figure 4.5 we can see clearly that the East South-Central division has more a progressive pattern of slavery exploitation during the observed period, whereas other divisions tend to have more of a stable pattern of slavery practice. Nevertheless, all these divisions have a similar proportion of black slaves in the last census year of slavery period (1860).
In this section, we will connect the slavery and post-slavery period in the USA through some observed events from blackpast data set before answering our main question. For the purpose of analysis, we will first filter country of interest in the data set to be only include the USA. From this we will select only important words (filtered using English stop words of tidytext R package (Silge and Robinson (2016))). Having all essential words, we aim to find various sentiments in these words by using NRC sentiment (Mohammad (2018)) and group the sentiments by region.
To link this analysis with the previous section, we add the regions that correspond to the state names in the blackpast data set. For example, we add Northeast for covering states like Connecticut, New York, and New Jersey. In addition we also focus on all events that related to slavery and racism behaviour toward African-American, represented by subjects such as “Slave Laws”, “Slave Labor”, “Racial Restrictions”, “Racial Violence”, “Resistance to Enslavement”, “The Slavery Controversy”, and “Antebellum Slavery”.)
Figure 4.6: Sentiment Analysis in Regions with sentiments of the words on the y-axis and count of words on the x-axis. From the plot we can see most of the words associated with Black Slavery are negative in nature having negative, fear and anger is the main sentiment and with trust being the lowest. We also see the South region contains most of the text analysed.
Figure 4.7: Negative Nuances from event statements with region on the x-axis and count of the words on the y-axis. From the plot we can see the South dominates in the count of text with fear and negative sentiments.
Using these methods, we can observe the occurrence of bad events in the USA history which comprise of racial restrictions and racial violence. According to the analysis in figure 4.6, most nuances of observed events are related to negative, fear and anger sentiments. Moreover, most events recorded occurred in the South region. Again, it is not surprising given the previous finding that South region is the region that most exploits slavery practice.
We then more focus on particular bad events which represented by sentiment of “disgust”, “fear”, “negative” and “sadness” (we get rid of positive sentiments such as joy) and plot it into bar chart as seen in figure 4.7. Using these negative sentiments, we can see that South and Northeast region are most “unfriendly” towards African American people, since both regions contribute to most of the bad events in the observed data. On the other hand, West region may be a good place to live for African Americans, due to very few bad incidents occurring there.
To explore the exploitation of African slaves for all countries across time during the slavery period we will be analysing data from the slave_routes data set which contains 85% of all voyages that embarked captives Rfordatascience (2020). Note there is missing data which only as the data set only contains details on the Africans who arrived alive at the final port. Many slaves died in transport and or details around the slave voyage were not fully recorded Rfordatascience (2020).
Figure 4.8 shows the line graph of the total slaves arriving by voyage each year with a smoothed line showing the trend in slaves exported across time. From the plot we can see an overall upward trend in the number of slaves exported from 1700 until 1800 followed by a sharper downward trend approaching 0 exported slaves toward the end of the period. The yearly exported slaves’ peaks at 79,472 arriving in 1829 and reaches 700 in 1866. From the plot, however, we see sharp fluctuations in the total arrivals. These fluctuations may however be resulted from missing data which will be assessed in the following section.
Figure 4.8: Line graph showing the total slaves arriving by voyage per year with year or arrival on the x-axis and total salves arriving per year on the y-axis
Figure 4.9 shows the histogram of the number of voyages per year in plot A and number of voyages per decade in plot B. From plot A we can see sharp dips in the number of voyages around 1745, 1780, 1809 and 1832. These dips are reflective of the dips that appear in Figure 4.8. Using the blackpast we can determine significant events occurring around those years to determine if these dips are genuine or resulting from missing Data. In 1745, 1780 and 1832 we see there is no events that would drastically effect the number of voyages, in fact we see detrimental events such as the banning of teaching African’s reading and writing and restricting rights to preach. Therefore, these dips in values, are likely due to missing values. However, in 1807 we see the US government abolishing the importation of enslaved Africans. However, 250,000 Africans are illegally imported from 1808, therefore, potentially highlighting the dips and missing values during these years as slaves were not officially recorded.
Plot B shows the overall trend in the number of voyages that grows from 1700 peaking in the 1770s then steadily declines after reaching the lowest in the 1865, however, it should be noted data is only recorded until 1866. There also appears to be a dip in the 1780s potentially highlighting missing values.
Figure 4.9: Histograms showing the number of voyages per year/decade with year on the x-axis and number of voyages per year (A) and per decade (B) on the y-axis
To further explore the exploitation of slaves the following plots have been included to examine the number of slaves arrived per voyage. Figure 4.10 shows the boxplot of the number of slaves arrived per voyage each decade. From the plot we can see an increasing trend in the average slaves arrived per voyage peaking in the 1860s. In later years we also see an increased spread in the number of slaves per voyage and the increased presence of outliers with the peak at 1,700 slaves arriving.
Figure 4.10: Boxplot showing the number of slaves arriving per voyage by decade with year on the x-axis and number of slaves arrived on the y-axis
To further explore this 4.11 shows the same boxplots, however, filtered for the year 1700 onward and removed counts above1500 to better compare decade counts. From the plot we can see from 1700 to 1800 the average count and spread per voyage remains fairly constant. From 1800 onward we see a pickup in the average slaves per voyage and the increased prominence of large slaves arriving per voyage. Across all decades there are voyages with 0 values highlighting the presence of missing values.
Figure 4.11: Boxplot showing the number of slaves arriving per voyage by decade from 1700 and significantly large values filtered with year on the x-axis and number of slaves arrived on the y-axis
To explore subjects based on the slavery and post-slavery eras, the data had to be filtered based on the year and a new column title slavery was created and the separating year was selected as 1865 which was the year slavery was abolished.
The data was then filtered again to only show observations for United States and later grouped by the subject column.
Figure 4.12: Most relavent subjects in slavery and post-slavery era
Unsurprisingly, the figure 4.12 reveals that slave laws, resistance of enslavement and racial restrictions were some of the most relevant subjects in the slavery era whereas subjects such as racial violence cannot be seen in the era which could be due to the fact that it was considered rather ordinary.
The post slavery era introduced some new subjects for the black history such as black education, formation of black organizations, and with voting rights, black population and black empowerment became majors subject for politics. The black population was also involved in major judicial decisions, major sports and even press.
The dataset filtered in the last section was then filtered again to leave out N.A. values for states and then filtered again to only show observations for subjects such as Black Organisation, The Civil Rights Movement and The Black Press as they are representative of black empowerment.
The filtered data was then plotted on a graph to show the states against the count of the selected activities.
Figure 4.13: Most active states for black empowerment post slavery
The figure 4.13 revealed that New York followed by the capital District of Columbia (now known as Washington D.C.) were some of the most active states for promoting black education, formation of black organizations, black press and overall empowerment of the african-american race. New York also hosted several events for “The Civil Rights Movement”.
From the analysis conducted we are successfully able to explore the history of slavery in America and its prominence in today’s history of racism towards African Americans.
In section 4.2 we were able to determine the practice was mostly exploited in the South region.
These events effecting slavery in America are further explored in section 4.3 where from the sentiment analysis e can see that most bad events took place in South region, followed by Northeast region. These bad events recorded are related to slavery exploitation and racism behaviour/racial violence (in post-slavery era) towards Africa-American. Hence, we may argue that today’s racism are possibly are result of associations with the long history of enslavement, especially in the case of the South region. This finding is quite intriguing since it is inadvertently backed by a journal research from Chae et al. (2015) that describe these two regions (South and Northeast) as the most racist regions in the United States.
Section 4.4 further explores American slavery looking at the voyages of slaves during the period finding the exportation of slaves peaking in the early 1800s declining until Juneteenth. However, despite the reduction in slaves exported the slaves exported per voyage grew deep into the 1800s.
Section 4.5 shows how after the abolition of slavery the events surrounding black population shifted from emancipation, resistance to slavery and changes in slave laws to black education, black organizations and black politics which in itself shows the continued efforts to empower the once tormented race and section 4.6 shows the states that made the most efforts in doing so with New York and Washington D.C. being at the forefront.
This report has been completed with the use of the following R packages: Wickham et al. (2019), Firke (2020), Wickham, Hester, and Francois (2018), Tierney et al. (2020), Wickham et al. (2020), Auguie (2017), Kassambara (2020), Wickham (2007), Silge and Robinson (2016), Grolemund and Wickham (2011), Sievert (2020), Arnold (2019), and Pedersen (2020)
Arnold, Jeffrey B. 2019. Ggthemes: Extra Themes, Scales and Geoms for ’Ggplot2’. https://CRAN.R-project.org/package=ggthemes.
Auguie, Baptiste. 2017. GridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.
“Black Lives Matter.” 2020. Black Lives Matter. https://blacklivesmatter.com/.
BlackPast. 2020. “African American History Timeline.” Welcome to Blackpast •. https://www.blackpast.org/african-american-history-timeline/.
Chae, David H., Sean Clouston, Mark L. Hatzenbuehler, Michael R. Kramer, Hannah L. F. Cooper, Sacoby M. Wilson, Seth I. Stephens-Davidowitz, Robert S. Gold, and Bruce G. Link. 2015. “Association between an Internet-Based Measure of Area Racism and Black Mortality.” Edited by Hajo Zeeb. PLOS ONE 10 (4): e0122963. https://doi.org/10.1371/journal.pone.0122963.
Firke, Sam. 2020. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.
Kassambara, Alboukadel. 2020. Ggpubr: ’Ggplot2’ Based Publication Ready Plots. https://CRAN.R-project.org/package=ggpubr.
Lockhart, P. R. 2018. “Why Celebrating Juneteenth Is More Important Now Than Ever.” Vox. Vox. https://www.vox.com/identities/2018/6/19/17476482/juneteenth-holiday-emancipation-african-american-celebration-history.
Mohammad, Saif M. 2018. Word Affect Intensities. Miyazaki, Japan.
Pedersen, Thomas Lin. 2020. Patchwork: The Composer of Plots. https://CRAN.R-project.org/package=patchwork.
Rfordatascience. 2020. “Rfordatascience/Tidytuesday/American Slavery and Juneteenth.” GitHub. https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-06-16/readme.md.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with R, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.
Silge, Julia, and David Robinson. 2016. “Tidytext: Text Mining and Analysis Using Tidy Data Principles in R.” JOSS 1 (3). https://doi.org/10.21105/joss.00037.
Tierney, Nicholas, Di Cook, Miles McBain, and Colin Fay. 2020. Naniar: Data Structures, Summaries, and Visualisations for Missing Data. https://CRAN.R-project.org/package=naniar.
Wickham, Hadley. 2007. “Reshaping Data with the reshape Package.” Journal of Statistical Software 21 (12): 1–20. http://www.jstatsoft.org/v21/i12/.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2020. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, Jim Hester, and Romain Francois. 2018. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.